More Covariance Functions
Nugget Covariance
\[ \text{Cov}(y_{t_i}, y_{t_j}) = \sigma^2 {1}_{\{h=0\}} \text{ where } h = |t_i - t_j|\]
(- / Powered / Square) Exponential Covariance
\[ \text{Cov}(y_{t_i}, y_{t_j}) = \sigma^2\exp\left(-(h\,l)^p\right) \text{ where } h = |t_i - t_j|\]
Matern Covariance
\[ \text{Cov}(y_{t_i}, y_{t_j}) = \sigma^2 ~ \frac{2^{1-\nu}}{\Gamma(\nu)} ~ \left(\sqrt{2\nu}\, h \cdot l\right)^\nu ~ K_\nu\left(\sqrt{2\nu} \, h \cdot l\right) \text{ where } h = |t_i - t_j|\]
Matern Covariance
\(K_\nu()\) is the modified Bessel function of the second kind.
A Gaussian process with Matérn covariance has sample functions that are \(\lceil \nu -1\rceil\) times differentiable.
When \(\nu = 1/2 + p\) for \(p \in \mathbb{N}^+\) then the Matern has a simplified form
When \(\nu = 1/2\) the Matern is equivalent to the exponential covariance.
As \(\nu \to \infty\) the Matern converges to the squared exponential covariance.
Rational Quadratic Covariance
\[ \text{Cov}(y_{t_i}, y_{t_j}) = \sigma^2 \left(1 + \frac{h^2 \, l^2}{\alpha}\right)^{-\alpha} \text{ where } h = |t_i - t_j|\]
Rational Quadratic Covariance
is a scaled mixture of squared exponential covariance functions with different characteristic inverse length-scales (\(l\) ).
As \(\alpha \to \infty\) the rational quadratic converges to the square exponential covariance.
Has sample functions that are infinitely differentiable for any value of \(\alpha\)
Spherical Covariance
\[
\text{Cov}(y_{t_i}, y_{t_j}) = \begin{cases}
\sigma^2\left(1 - \frac{3}{2} h \cdot l + \frac{1}{2} (h \cdot l)^3)\right) & \text{if } 0 < h < 1/l \\
0 & \text{otherwise}
\end{cases}
\]
Periodic Covariance
\[ \text{Cov}(y_{t_i}, y_{t_j}) = \sigma^2 \exp\left(-2\, l^2 \sin^2\left(\pi\frac{h}{p}\right)\right) \text{ where } h = |t_i - t_j| \]
Linear Covariance
\[ \text{Cov}(y_{t_i}, y_{t_j}) = \sigma^2_b + \sigma^2_v~(t_i-c)(t_j-c)\]
Combining Covariances
If we definite two valid covariance functions, \(Cov_a(y_{t_i}, y_{t_j})\) and \(Cov_b(y_{t_i}, y_{t_j})\) then the following are also valid covariance functions,
\[
\begin{aligned}
Cov_a(y_{t_i}, y_{t_j}) + Cov_b(y_{t_i}, y_{t_j}) \\
\\
Cov_a(y_{t_i}, y_{t_j}) \times Cov_b(y_{t_i}, y_{t_j})
\end{aligned}
\]
Linear \(\times\) Linear \(\to\) Quadratic
\[ Cov_a(y_{t_i}, y_{t_j}) = 1 + 2~(t_i \times t_j) \] \[ Cov_b(y_{t_i}, y_{t_j}) = 2 + 1~(t_i \times t_j) \]
Linear \(\times\) Periodic
\[ Cov_a(y_{t_i}, y_{t_j}) = 1 + 1~(t_i \times t_j) \] \[ Cov_b(y_{t_i}, y_{t_j}) = \exp\left(-2\, \sin^2\left(2\pi\,h\right)\right) \]
Linear + Periodic
\[ Cov_a(y_{t_i}, y_{t_j}) = 1 + 1~(t_i \times t_j) \] \[ Cov_b(h = |t_i - t_j|) = \exp\left(-2\, \sin^2\left(2\pi\,h\right)\right) \]
Sq Exp \(\times\) Periodic \(\to\) Locally Periodic
\[ Cov_a(h = |t_i - t_j|) =\exp(-(1/3)h^2) \] \[ Cov_b(h = |t_i - t_j|) = \exp\left(-2\, \sin^2\left(\pi\,h\right)\right) \]
Sq Exp (short) + Sq Exp (long)
\[ Cov_a(h = |t_i - t_j|) = (1/4) \exp(-4\sqrt{3}h^2) \] \[ Cov_b(h = |t_i - t_j|) = \exp(-(\sqrt{3}/2)h^2) \]
Seen another way
Births (one year)
Smooth long term trend (sq exp cov )
Seven day periodic trend with decay (periodic x sq exp cov )
Constant mean
Birth data
Model (JAGS)
model = "model{
y ~ dmnorm(rep(0,N), inverse(Sigma))
for (i in 1:(length(y)-1)) {
for (j in (i+1):length(y)) {
k1[i,j] <- sigma2[1] * exp(- pow(l[1] * d[i,j],2))
k2[i,j] <- sigma2[2] * exp(- pow(l[2] * d[i,j],2) - 2 * pow(l[3] * sin(pi*d[i,j] / per), 2))
Sigma[i,j] <- k1[i,j] + k2[i,j]
Sigma[j,i] <- Sigma[i,j]
}
}
for (i in 1:length(y)) {
Sigma[i,i] <- sigma2[1] + sigma2[2] + sigma2[3]
}
for(i in 1:3){
sigma2[i] ~ dt(0, 2.5, 1) T(0,)
l[i] ~ dt(0, 2.5, 1) T(0,)
}
}"
Model fitting
m = rjags:: jags.model (
textConnection (model),
data = list (
y = births$ scaled_log_births,
d = fields:: rdist (births$ day_of_year / max (births$ day_of_year)),
per = 7 / max (births$ day_of_year),
pi = pi,
N = nrow (births)
),
n.adapt= 5000 ,
n.chains = 1
)
gp_coda = rjags:: coda.samples (
m, variable.names= c ("sigma2" , "l" ),
n.iter= 5000 ,
thin= 5
)
Component Contributions
We can view our GP in the following ways (marginal form),
\[ \boldsymbol{y} \sim N(\boldsymbol{\mu},~ \boldsymbol{\Sigma}_1 + \boldsymbol{\Sigma}_2 + \sigma^2 \boldsymbol{I}\,) \]
but with appropriate conditioning we can also think of \(\boldsymbol{y}\) as being the sum of multiple independent GPs (latent form)
\[ \boldsymbol{y} = \boldsymbol{\mu} + w_1(\boldsymbol{t}) + w_2(\boldsymbol{t}) + w_3(\boldsymbol{t})\] where \[
\begin{aligned}
w_1(\boldsymbol{t}) &\sim N(0, \boldsymbol{\Sigma}_1) & (\text{sq exp covariance}) \\
w_2(\boldsymbol{t}) &\sim N(0, \boldsymbol{\Sigma}_2) & (\text{periodic x sq exp cov})\\
w_3(\boldsymbol{t}) &\sim N(0, \sigma^2 \boldsymbol{I}\,) & (\text{nugget cov / white noise})
\end{aligned}
\]
Decomposition of Covariance Components
\[
\begin{bmatrix}
y \\
w_1 \\
w_2
\end{bmatrix}
\sim N \left(
\begin{bmatrix}
\boldsymbol{\mu} \\ 0 \\ 0
\end{bmatrix},~
\begin{bmatrix}
\Sigma_1 + \Sigma_2 + \sigma^2 \boldsymbol{I} & \Sigma_1 & \Sigma_2 \\
\Sigma_1 & \Sigma_1 & 0\\
\Sigma_2 & 0 & \Sigma_2 \\
\end{bmatrix}
\right)
\]
therefore, if we want to know the contribution of \(w_1\) we have the following
\[ w_1 ~|~ \boldsymbol{y},\boldsymbol{\mu},\boldsymbol{\theta} \sim N(\boldsymbol{\mu}_{cond},~ \boldsymbol{\Sigma}_{cond}) \]
\[ \boldsymbol{\mu}_{cond} = 0 + \Sigma_1 ~ (\Sigma_1 + \Sigma_2 + \sigma^2 I)^{-1}(\boldsymbol{y}-\boldsymbol{\mu}) \] \[ \boldsymbol{\Sigma}_{cond} = \Sigma_1 - \Sigma_1 (\Sigma_1 + \Sigma_2 + \sigma^2 \boldsymbol{I})^{-1} {\Sigma_1}^t \]
Covariance calculations
cov_C1 = function (d, sigma2, l, per) {
sigma2[1 ] * exp (- (l[1 ] * d)^ 2 )
}
cov_C2 = function (d, sigma2, l, per) {
sigma2[2 ] * exp (- (l[2 ] * d)^ 2 - 2 * (l[3 ] * sin (pi* d / per))^ 2 )
}
cov_full = function (d, sigma2, l, per) {
cov_C1 (d, sigma2, l, per) +
cov_C2 (d, sigma2, l, per) +
ifelse (abs (d)< 1e-6 , sigma2[3 ] + 1e-6 , 0 )
}
Conditional samples
full = dukestm:: cond_predict (
y= y, x= x, x_pred= x_pred, cov_full, sigma2 = post$ sigma2, l= post$ l, per= 7 / max (births$ day_of_year)
)
comp_C1 = dukestm:: cond_predict (
y= y, x= x, x_pred= x_pred, sigma2 = post$ sigma2, l= post$ l, per= 7 / max (births$ day_of_year),
cov_f_o = cov_full,
cov_f_p = cov_C1,
cov_f_po = cov_C1
)
comp_C2 = dukestm:: cond_predict (
y= y, x= x, x_pred= x_pred, sigma2 = post$ sigma2, l= post$ l, per= 7 / max (births$ day_of_year),
cov_f_o = cov_full,
cov_f_p = cov_C2,
cov_f_po = cov_C2
)
Results
Births (multiple years)
Full stan case study here with code here
slowly changing trend - yearly (sq exp cov )
small time scale trend - monthly (sq exp cov )
7 day periodic - day of week effect (periodic \(\times\) sq exp cov )
365.25 day periodic - day of year effect (periodic \(\times\) sq exp cov )
special days and interaction with weekends (linear cov )
independent Gaussian noise (nugget cov )
constant mean